The following will make sure you have what you need to run the rest of the code.
Machine Learning
Use caret to employ machine learning
Start with some pre-processing of the data
Example with XGBoost
library(xgboost) # need to install?
xgb_opts = expand.grid(
eta = c(.3, .4),
max_depth = c(9, 12),
colsample_bytree = c(.6, .8),
subsample = c(.5, .75, 1),
nrounds = 100, # 1000 would be more reasonable, but notably time consuming
min_child_weight = 1,
gamma = 0
)
cv_opts = trainControl(method='cv', number=10)
Run in parallel
Python
With machine learning, we finally get to a point where Python is on par with and typically surpasses R.
Most techniques that would fall under the heading of machine learning are first developed in Python.
For at least some techniques, Python will typically run faster, possibly notably so, but this depends on many factors.
Random forest
Inspect the best result over the tuning parameters
Test model on new data
LS0tCnRpdGxlOiAnTW9kdWxlIDQ6IE1vcmUgQW5hbHl0aWNzJwpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIGRmX3ByaW50OiBwYWdlZAogICAgY3NzOiBvdGhlci5jc3MKICBodG1sX25vdGVib29rOgogICAgY3NzOiBvdGhlci5jc3MKICAgIGhpZ2hsaWdodDogcHlnbWVudHMKICAgIHRoZW1lOiBzYW5kc3RvbmUKZWRpdG9yX29wdGlvbnM6CiAgY2h1bmtfb3V0cHV0X3R5cGU6IGlubGluZQotLS0KCmBgYHtyIGluaXQsIGVjaG89RkFMU0V9CiMgdGhlc2Ugb3B0aW9ucyBhcmUgcHJpbWFyeSB1c2VmdWwgdG8gdGhlIGNyZWF0aW9uIG9mIHRoZSBodG1sIGRvY3VtZW50CmtuaXRyOjpvcHRzX2NodW5rJHNldCgKICBlY2hvPVQsIAogIGV2YWwgPSBGLAogIG1lc3NhZ2UgPSBGLCAKICB3YXJuaW5nID0gRiwgCiAgY29tbWVudCA9IE5BLAogIFIub3B0aW9ucz1saXN0KHdpZHRoPTEyMCksIAogIGNhY2hlLnJlYnVpbGQgPSBGLAogIGNhY2hlID0gRiwgCiAgZmlnLmFsaWduPSdjZW50ZXInLCAKICBmaWcuYXNwID0gLjcsCiAgZGV2ID0gJ3N2ZycsIAogIGRldi5hcmdzPWxpc3QoYmcgPSAndHJhbnNwYXJlbnQnKQopCmBgYAoKVGhlIGZvbGxvd2luZyB3aWxsIG1ha2Ugc3VyZSB5b3UgaGF2ZSB3aGF0IHlvdSBuZWVkIHRvIHJ1biB0aGUgcmVzdCBvZiB0aGUgY29kZS4KCmBgYHtyIGNhdGNodXB9CmxpYnJhcnkodGlkeXZlcnNlKQptb2RlbF92YXJpYWJsZXMgPSByZWFkLmNzdignZGF0YS9tb2RlbF92YXJpYWJsZXNfYW5vbnltaXplZC5jc3YnKQpgYGAKCgojIyBNYWNoaW5lIExlYXJuaW5nCgpVc2UgY2FyZXQgdG8gZW1wbG95IG1hY2hpbmUgbGVhcm5pbmcKClN0YXJ0IHdpdGggc29tZSBwcmUtcHJvY2Vzc2luZyBvZiB0aGUgZGF0YQoKYGBge3IgcHJlcHJvY2Vzc30KbGlicmFyeShjYXJldCkgIyBuZWVkIHRvIGluc3RhbGw/CnNldC5zZWVkKDEyMzQpICMgc28gdGhhdCB0aGUgaW5kaWNlcyB3aWxsIGJlIHRoZSBzYW1lIHdoZW4gcmUtcnVuCnRyYWluSW5kaWNlcyA9IGNyZWF0ZURhdGFQYXJ0aXRpb24obW9kZWxfdmFyaWFibGVzJGxpYnVzZXIsIHA9LjgsIGxpc3Q9RikKClhfdHJhaW4gPSBtb2RlbF92YXJpYWJsZXMgJT4lIAogIHNsaWNlKHRyYWluSW5kaWNlcykKClhfdGVzdCA9IG1vZGVsX3ZhcmlhYmxlcyAlPiUgCiAgc2xpY2UoLXRyYWluSW5kaWNlcykKYGBgCgoKCkV4YW1wbGUgd2l0aCBYR0Jvb3N0CgpgYGB7ciB4Z2Jvb3N0X3NldHVwfQpsaWJyYXJ5KHhnYm9vc3QpICAjIG5lZWQgdG8gaW5zdGFsbD8KCnhnYl9vcHRzID0gZXhwYW5kLmdyaWQoCiAgZXRhID0gYyguMywgLjQpLAogIG1heF9kZXB0aCA9IGMoOSwgMTIpLAogIGNvbHNhbXBsZV9ieXRyZWUgPSBjKC42LCAuOCksCiAgc3Vic2FtcGxlID0gYyguNSwgLjc1LCAxKSwKICBucm91bmRzID0gMTAwLCAjIDEwMDAgd291bGQgYmUgbW9yZSByZWFzb25hYmxlLCBidXQgbm90YWJseSB0aW1lIGNvbnN1bWluZwogIG1pbl9jaGlsZF93ZWlnaHQgPSAxLAogIGdhbW1hID0gMAopCgpjdl9vcHRzID0gdHJhaW5Db250cm9sKG1ldGhvZD0nY3YnLCBudW1iZXI9MTApCmBgYAoKUnVuIGluIHBhcmFsbGVsCgpgYGB7ciB4Z2Jvb3N0fQojIGZvciBwYXJhbGxlbCBwcm9jZXNzaW5nCmxpYnJhcnkoZG9QYXJhbGxlbCkgICMgbmVlZCB0byBpbnN0YWxsPwpjbCA9IG1ha2VDbHVzdGVyKGRldGVjdENvcmVzKCkgLSAxKQpyZWdpc3RlckRvUGFyYWxsZWwoY2wpCgpyZXN1bHRzX3hnYiA9IHRyYWluKAogIGxpYnVzZXIgfiAuLAogIGRhdGEgPSBYX3RyYWluLAogIG1ldGhvZCA9ICd4Z2JUcmVlJywKICBwcmVQcm9jZXNzID0gYygnY2VudGVyJywgJ3NjYWxlJyksCiAgdHJDb250cm9sID0gY3Zfb3B0cywKICB0dW5lR3JpZCA9IHhnYl9vcHRzCikKCnN0b3BDbHVzdGVyKGNsKQoKcmVzdWx0c194Z2IKYGBgCgotLS0KCiMjIE1hY2hpbmUgTGVhcm5pbmcKCgpgYGB7ciB4Z2JfY219CnByZWRzX2diID0gcHJlZGljdChyZXN1bHRzX3hnYiwgWF90ZXN0KQpjb25mdXNpb25NYXRyaXgocHJlZHNfZ2IsIFhfdGVzdCRsaWJ1c2VyLCBwb3NpdGl2ZT0neWVzJykKYGBgCgoKCiMjIFB5dGhvbgoKV2l0aCBtYWNoaW5lIGxlYXJuaW5nLCB3ZSBmaW5hbGx5IGdldCB0byBhIHBvaW50IHdoZXJlIFB5dGhvbiBpcyBvbiBwYXIgd2l0aCBhbmQgdHlwaWNhbGx5IHN1cnBhc3NlcyBSLgoKTW9zdCB0ZWNobmlxdWVzIHRoYXQgd291bGQgZmFsbCB1bmRlciB0aGUgaGVhZGluZyBvZiBgbWFjaGluZSBsZWFybmluZ2AgYXJlIGZpcnN0IGRldmVsb3BlZCBpbiBQeXRob24uCgpGb3IgYXQgbGVhc3Qgc29tZSB0ZWNobmlxdWVzLCBQeXRob24gd2lsbCB0eXBpY2FsbHkgcnVuIGZhc3RlciwgcG9zc2libHkgbm90YWJseSBzbywgYnV0IHRoaXMgZGVwZW5kcyBvbiBtYW55IGZhY3RvcnMuCgojIyMgSW5pdAoKYGBge3B5dGhvbiBweV9pbml0LCBlbmdpbmUucGF0aD0gJy9Vc2Vycy9taWNsL2FuYWNvbmRhMy9iaW4vcHl0aG9uJ30KCiMgbm90ZSBob3cgd2hlbiB1c2luZyBzb21ldGhpbmcgb3RoZXIgdGhhbiBSLCB5b3UgaGF2ZSB0byBzcGVjaWZ5IHRoZSBlbmdpbmUgcGF0aAoKaW1wb3J0IHBhbmRhcyBhcyBwZAppbXBvcnQgbnVtcHkgYXMgbnAKaW1wb3J0IHN0YXRzbW9kZWxzCgoKbW9kZWxfdmFyaWFibGVzID0gcGQucmVhZF9jc3YoJ2RhdGEvbW9kZWxfdmFyaWFibGVzX2Fub255bWl6ZWQuY3N2JykKYGBgCgojIyMgUmFuZG9tIGZvcmVzdAoKYGBge3B5dGhvbiByZjEsIGV2YWw9Rn0KZnJvbSBza2xlYXJuLmVuc2VtYmxlIGltcG9ydCBSYW5kb21Gb3Jlc3RDbGFzc2lmaWVyCgpyZiA9IFJhbmRvbUZvcmVzdENsYXNzaWZpZXIobl9lc3RpbWF0b3JzPTEwMDApICAjIG51bWJlciBvZiB0cmVlcwoKcmZfb3B0cyA9IHsnbWF4X2ZlYXR1cmVzJzogbnAuYXJhbmdlKDIsNyl9ICAjIHR1bmluZyBwYXJhbWV0ZXIKcmZfZXN0aW1hdG9yID0gR3JpZFNlYXJjaENWKHJmLCBjdj0xMCwgcGFyYW1fZ3JpZD1yZl9vcHRzLCBuX2pvYnM9NCkgICMgMTAtZm9sZCBjdgpyZXN1bHRzX3JmID0gcmZfZXN0aW1hdG9yLmZpdChYX3RyYWluLCB5X3RyYWluKSAgIyByZXF1aXJlcyBtYXRyaWNlcwpgYGAKCkluc3BlY3QgdGhlIGJlc3QgcmVzdWx0IG92ZXIgdGhlIHR1bmluZyBwYXJhbWV0ZXJzCgpgYGB7cHl0aG9uIHJmMiwgZXZhbD1GfQpyZXN1bHRzX3JmLmJlc3Rfc2NvcmVfCnJlc3VsdHNfcmYuYmVzdF9wYXJhbXNfCmBgYAoKVGVzdCBtb2RlbCBvbiBuZXcgZGF0YQoKYGBge3B5dGhvbiByZjMsIGV2YWw9Rn0KcmZfcHJlZGljdCA9IHJlc3VsdHNfcmYucHJlZGljdChYX3Rlc3QpCnByaW50KG1ldHJpY3MuY2xhc3NpZmljYXRpb25fcmVwb3J0KHlfdGVzdCwgcmZfcHJlZGljdCkpCmBgYAoKCg==